Simpson's Paradox¶

Use admission_data.csv for this exercise.

# Load and view first few lines of dataset
import pandas as pd 
import numpy as np 

df = pd.read_csv("admission_data.csv") 
df.head()

Proportion and admission rate for each gender¶

df.groupby([ 'gender']).admitted.value_counts()

gender  admitted
female  False       183
        True         74
male    False       125
        True        118
Name: admitted, dtype: int64

74/(74+183)

0.28793774319066145

# Proportion of students that are female
rows = df.shape[0] 
(df.gender.value_counts()/rows)[1]

0.48599999999999999

# Proportion of students that are male
(df.gender.value_counts()/rows)[0]

0.51400000000000001

# Admission rate for females
len(df[(df.admitted==True)&(df.gender=='female')])/len(df[df.gender=='female'])

0.28793774319066145

# Admission rate for males
len(df[(df.admitted==True)&(df.gender=='male')])/len(df[df.gender=='male'])

0.48559670781893005

total_females = df.gender.value_counts().female
total_males = df.gender.value_counts().male
print(total_females) 
print(total_males) 
print(df.shape)

257
243
(500, 4)

Proportion and admission rate for physics majors of each gender¶

# What proportion of female students are majoring in physics?
physics = df[df.major=='Physics']
print(physics.shape)
print(physics.gender.value_counts())
print()
print(physics.gender.value_counts().female / total_females)

(256, 4)
male      225
female     31
Name: gender, dtype: int64

0.120622568093

# What proportion of male students are majoring in physics?
physics.gender.value_counts().male / total_males

0.92592592592592593

total_admitted_for_physics = physics[(physics.admitted==True)].admitted.sum()
females_admitted_for_physics = len(physics[(physics.admitted==True) & (physics.gender=='female')])
males_admitted_for_physics = len(physics[(physics.admitted==True) & (physics.gender=='male')])

# Admission rate for female physics majors
females_admitted_for_physics / total_admitted_for_physics

0.16546762589928057

# Admission rate for male physics majors
males_admitted_for_physics / total_admitted_for_physics

0.83453237410071945

physics.groupby('gender').admitted.value_counts()

gender  admitted
female  True         23
        False         8
male    True        116
        False       109
Name: admitted, dtype: int64

23/31

0.7419354838709677

Proportion and admission rate for chemistry majors of each gender¶

chem = df[df.major=='Chemistry']
print(chem.shape)
print(chem.gender.value_counts())
print()

(244, 4)
female    226
male       18
Name: gender, dtype: int64

# What proportion of female students are majoring in chemistry?
chem.gender.value_counts().female / df.gender.value_counts().female

0.87937743190661477

# What proportion of male students are majoring in chemistry?
chem.gender.value_counts().male / df.gender.value_counts().male

0.07407407407407407

chem.groupby('gender').admitted.value_counts()

gender  admitted
female  False       175
        True         51
male    False        16
        True          2
Name: admitted, dtype: int64

# Admission rate for female chemistry majors
51 /(51+175)

0.22566371681415928

# Admission rate for male chemistry majors
2 /18

0.1111111111111111

Admission rate for each major¶

df.groupby(['major', 'admitted']).gender.value_counts()

major      admitted  gender
Chemistry  False     female    175
                     male       16
           True      female     51
                     male        2
Physics    False     male      109
                     female      8
           True      male      116
                     female     23
Name: gender, dtype: int64

# Admission rate for physics majors
len(df[(df.admitted==True) & (df.major=='Physics')]) / len(df[df.admitted==True])

0.7239583333333334

# Admission rate for chemistry majors
len(df[(df.admitted==True) & (df.major=='Chemistry')]) / len(df[df.admitted==True])

0.2760416666666667

	student_id	gender	major	admitted
0	35377	female	Chemistry	False
1	56105	male	Physics	True
2	31441	female	Chemistry	False
3	51765	male	Physics	True
4	53714	female	Physics	True